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Abstract 

Researchers  in  mixed-initiative  problem-solving  have  gener¬ 
ally  viewed  interaction  between  the  user  and  the  system  as  a 
form  of  dialog,  which  provides  an  effective  unifying  frame¬ 
work  for  multimodal  systems.  For  mixed-initiative  interac¬ 
tion  through  a  visual  medium,  however,  an  approach  that  ex¬ 
ploits  our  visual  perceptual  abilities  and  the  benefits  of  direct 
manipulation  mechanisms  is  equally  compelling.  This  paper 
explores  the  possibility  of  communication  between  human 
planners  and  intelligent  planning  systems  via  shared  control 
of  a  three-dimensional  graphical  user  interface.  We  are  cur¬ 
rently  testing  our  early  development  efforts  in  the  Visual  In¬ 
teraction  Dialog  (VID)  system,  which  supports  agent  and  user 
manipulation  of  camera  placement  for  communicating  plan 
structure  and  domain  information. 


Introduction 

A  view  of  human-computer  interaction  as  dialog  has  come 
to  dominate  research  on  mixed-initiative  systems.  A  dialog- 
based  framework  unifies  the  different  types  of  communi¬ 
cation  and  interaction  supported  by  a  multimodal  system. 
For  example,  instructions  entered  via  spoken  input  or  typed 
commands  or  direct  manipulation  can  all  be  interpreted  in  a 
common  (symbolic)  representation.  The  same  applies  to  dif¬ 
ferent  modes  of  output,  whether  generated  speech,  natural 
language  explanations,  or  any  of  a  variety  of  graphical  and 
tabular  visualizations.  The  dialog  perspective  has  generally 
been  successful,  resulting  in  a  number  of  well-known  sys¬ 
tems  in  mixed-initiative  planning,  including  TRAINS  (Fergu¬ 
son  &  Allen  1996),  TRIPS  (Ferguson  &  Allen  1998),  COL¬ 
LAGEN  (Rich  &  Sidner  1998),  and  others. 

In  mixed-initiative  planning,  users  collaborate  with  soft¬ 
ware  agents  to  produce  plans.  Effective  collaboration  be¬ 
tween  human  planners  and  automated  software  requires 
that  participants  work  in  areas  where  they  perform  best, 
use  appropriate  representations  for  communication,  and  ac¬ 
quire/transfer  authority  for  planning  tasks  (Burstein  &  Mc¬ 
Dermott  1996).  These  system  goals,  along  with  previous 
studies  (Allen  1994;  Ferguson  &  Allen  1996;  1998),  have 
motivated  the  use  of  dialog  support  in  planning  systems. 
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While  the  dialog-based  approach  has  proven  effective,  it 
is  not  without  its  difficulties.  User  interface  designers,  for 
example,  have  argued  for  direct  manipulation  as  an  alter¬ 
native  to  command  line  interaction  for  almost  two  decades. 
The  central  point  of  disagreement  is  not  about  the  weakness 
of  a  command  line  interface  in  contrast  to,  say,  the  power 
and  flexibility  of  unrestricted  natural  language,  but  rather 
about  whether  human-computer  interaction  is  best  viewed 
as  a  dialog  or  as  action  in  an  environment.  In  this  paper  we 
explore  some  of  the  issues  raised  by  this  alternative  perspec¬ 
tive,  in  which  we  concentrate  on  the  ability  of  an  interactive 
environment  to  constrain  and  guide  the  behavior  of  a  human 
user  as  well  as  provide  guidance  to  an  automated  planner. 

Our  work  has  some  of  the  flavor  of  the  ecological  view 
of  human-computer  interaction  (HCI)  (Flach  el  al.  1995; 
1. Gibson  1979;  St.  Amant  1999).  In  human-computer  in¬ 
teraction  circles,  interface  designers  are  encouraged  to  pro¬ 
vide  cues  in  their  environments  that  indicate  how  objects  can 
be  used,  in  order  to  improve  ease  of  use,  reduce  the  need 
for  instructions,  and  enhance  familiarity  with  the  interface. 
These  cues  are  sometimes  referred  to  as  affordances.  Ide¬ 
ally,  the  affordances  of  an  environment  suggest  appropriate 
responses  at  any  point  in  time,  such  that  one  is  led  through 
the  most  effective  sequences  of  actions  toward  one’s  goals. 
The  ecological  perspective  suggests  a  few  desirable  proper¬ 
ties  for  a  mixed-initiative  system: 

•  The  system  can  adapt  the  environment  such  that  some  ac¬ 
tions  can  be  carried  out  more  easily  than  others. 

•  The  system  can  present  the  environment  such  that  these 
actions  appear  (visually,  aurally,  etc.)  to  be  easier  or  more 
direct  than  others. 

•  The  system  can  convey  goals,  state  information,  and  at 
least  some  task  structure  (e.g.,  focusing  only  on  the  ob¬ 
jects  related  to  a  task)  by  changing  the  environment. 

•  Conversely,  if  the  user  makes  changes  to  the  environment, 
the  system  can  interpret  these  appropriately. 

•  The  system  is  accommodating,  in  that  its  suggestions  rule 
out  the  user’s  choosing  other  possibilities. 

Note  that  these  can  in  principle  be  achieved  by  a  dialog- 
based  system,  but  the  capabilities  fit  more  naturally  into  a 
direct  manipulation  interpretation  of  interaction. 
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Our  approach  focuses  on  the  importance  of  visual  stim¬ 
uli  to  human  perception  and  understanding.  We  adopt  the 
existing  idea  of  using  a  direct  manipulation  interface  for  di¬ 
alog  among  agents  (Moo  1995).  In  our  planning  environ¬ 
ment,  currently  under  development,  system  agents  manip¬ 
ulate  the  location  and  direction  of  cameras  used  for  view¬ 
ing  three-dimensional  (3-D)  plans  overlaid  onto  domain  spe¬ 
cific  representations.  Although  a  camera  metaphor  (Carroll, 
Mack,  &  Kellogg  1988)  may  not  map  directly  to  a  planner’s 
knowledge  of  the  domain,  it  can  exploit  innate  features  of 
human  attention  and  perception  (Banks  &  Karjicek  1991; 
Kinchla  1991).  Furthermore,  not  only  does  placing  3-D  rep¬ 
resentations  of  plan  components  into  a  planning  domain  al¬ 
low  users  to  directly  add,  remove,  and  edit  plan  components 
and  relationships,  but  some  domain  dependent  characteris¬ 
tics  become  easily  recognized,  in  what  Woods  (1991)  calls 
“design  for  information  extraction.” 

This  paper  is  structured  as  follows.  In  the  Related  Work 
section  we  discuss  three  areas  of  research  that  have  influ¬ 
enced  our  own:  mixed-initiative  planning,  intelligent  mul¬ 
timedia  systems,  and  visual  perception  for  interactive  data 
analysis.  Our  work  takes  a  step  toward  integration  of  dis¬ 
parate  themes  in  these  areas.  The  next  section  describes 
AFS,  the  Abstract  Force  Simulator  in  which  our  planning 
research  takes  place.  In  the  section  that  follows  we  discuss 
visual  dialog  for  inter-agent  communications  and  coordina¬ 
tion,  concentrating  on  the  potential  of  a  visual  system  to  sup¬ 
port  flexible,  interactive  visualizations,  context  registration, 
and  dialog-based  task  management.  In  the  final  section  we 
describe  a  prototype  3-D  interface  we  have  developed  for 
AFS,  called  VID.  The  work  in  this  last  section  is  preliminary. 
We  do  not  yet  have  a  full  implementation  in  which  all  com¬ 
ponents  are  integrated;  we  can  automatically  produce  a  va¬ 
riety  of  examples  such  as  the  one  shown,  on  the  fly,  but  they 
are  currently  canned  in  the  sense  that  they  are  not  produced 
automatically  by  the  planner,  but  are  instead  controlled  by 
an  independent  visualization  module. 

While  it  is  possible  to  interpret  input  and  output  through 
a  visual  medium  as  simply  another  mode  of  dialog,  HCI 
researchers  have  long  argued  that  direct  manipulation  pro¬ 
vides  a  qualitatively  different  interaction  experience.  We  in¬ 
tend  to  incorporate  findings  from  the  literature  on  perception 
and  direct  manipulation  into  a  dialog  framework,  with  the 
goal  of  allowing  shared  control  of  a  graphical  user  interface 
for  inter-agent  communication  in  a  mixed-initiative  planning 
system. 

Related  work 

Our  work  on  visual  interaction  dialog  merges  research  from 
several  areas  including  mixed  initiative  planning,  intelligent 
multimedia  systems,  visual  perception,  and  interactive  data 
analysis. 

Mixed-initiative  planning.  A  mixed-initiative  planning 
system  can  (perhaps  inevitably)  increase  the  amount  of  col¬ 
laboration  required  between  the  system  and  user.  At  times 
the  system  may  be  employed  as  a  tool  for  completing  famil¬ 
iar  and  important  tasks.  Flowever,  at  other  times  the  system 
will  need  to  function  autonomously  to  complete  unfamil¬ 


iar  or  time  consuming  tasks.  Although  users  delegate  tasks 
to  the  system,  they  should  not  have  to  surrender  the  abil¬ 
ity  to  guide  and  review  the  decision-making  process.  Work 
by  James  Allen  (Allen  1994)  characterizes  mixed-initiative 
planning  on  the  basis  of  three  characteristics:  the  flexible 
and  opportunistic  exchange  of  initiative,  shifting  focus  of 
attention  to  meet  user  needs,  and  providing  mechanisms  for 
maintaining  shared  implicit  knowledge.  These  three  char¬ 
acteristics  are  closely  related  to  cognitive  orientation,  deep 
knowledge,  intention  sharing,  and  control  plasticity,  compo¬ 
nents  of  Silverman’s  model  of  collaboration  processes  (Sil¬ 
verman  1992).  Burstein  and  McDermott,  in  their  summary 
of  mixed-initiative  planning,  additionally  point  out  that  re¬ 
search  in  inter-agent  communication  should  provide  flex¬ 
ible  visualizations,  context  registration,  and  task  manage¬ 
ment  support  (Burstein  &  McDermott  1996).  These  latter 
points  are  addressed  in  a  later  section. 

Intelligent  multimedia  systems.  Advances  in  graphics 
hardware  and  software  technologies  help  reduce  the  cost  of 
generating  quality  3-D  images,  thus  increasing  the  feasibil¬ 
ity  of  immersing  users  into  dynamic  virtual  worlds.  Re¬ 
cent  intelligent  multimedia  research  has  taken  advantage 
of  cinematography  heuristics  to  produce  systems  for  au¬ 
tomatic  explantation  generation,  intelligent  tutoring,  and 
other  tasks  (Feiner  &  McKeown  1991;  Smith  &  Bates 
1989;  Karp  &  Feiner  1990;  Seligmann  &  Feiner  1991; 
Gleicher  &  Witkin  1992;  Phillips,  Badler,  &  Granieri  1992; 
Drucker  &  Zelter  1994;  1995;  Christianson  et  al.  1996; 
He,  Cohen,  &  Salesin  1996;  Bares  &  Lester  1997;  1999). 
Although  some  of  these  systems  present  a  direct  manipu¬ 
lation  interface  to  the  user,  the  camera  is  not  considered 
a  method  for  communication;  instead,  camera  planning  is 
simply  used  to  orient  the  user’s  perspective  in  the  virtual 
world.  Our  goal  is  slightly  different,  in  that  we  want  to  sup¬ 
port  dynamic  communication  of  task  and  domain  informa¬ 
tion  between  the  system  and  the  user,  with  shared  control  of 
camera  placement  and  direction.  This  may  (we  hope)  have 
the  additional  benefit  of  identifying  some  of  the  lower-level 
foundations,  based  on  principles  of  human  attention,  per¬ 
ception,  and  interaction,  for  current  heuristic  approaches  to 
intelligent  multimedia. 

Visual  perception  and  data  analysis.  If  we  consider  that 
sign  language  and  gestures  have  been  used  for  communi¬ 
cation  for  millenia,  visual  communication  can  hardly  be 
regarded  as  a  new  concept.  However,  the  principles  be¬ 
hind  effective  communication  through  pictures,  graphs,  and 
computer-generated  images  have  only  much  more  recently 
been  examined.  Pioneering  work  from  Tufte  (Tufte  1983), 
Cleveland  (Cleveland  1985),  and  Friedhoff  (Friedhoff  & 
Benzon  1989)  provided  fuel  for  a  later  generation  of  work 
by  Keller  (Keller  &  Keller  1993),  Kosslyn  (Kosslyn  1994), 
Brown  (Brown  et  al.  1995),  and  Bertoline  (Bertoline  el  al. 
1997).  We  have  a  particular  interest  in  effective  graphic 
communication  in  statistical  data  analysis  systems.  The  Vis¬ 
age  environment  (Roth  et  al.  ),  for  example,  utilizes  a  direct 
manipulation  interface  allowing  users  to  explore  complex  re¬ 
lationships  among  data. 

All  of  this  work,  based  on  visual  perception,  emphasizes 


the  correct  usage  of  graphs  and  various  forms  of  visual 
images  for  communication.  Effective  use  of  the  graphical 
forms  presented  in  a  direct  manipulation  interface  also  re¬ 
lies  on  ecological  concepts.  Typical  direct  manipulation  in¬ 
terfaces  rely  on  the  affordances  provided  by  buttons,  scroll- 
bars,  sliders,  and  other  widgets  for  interaction.  Due  to  the 
successful  integration  of  these  objects  into  common  inter¬ 
faces,  some  researchers  have  suggested  that  mapping  the  ap¬ 
pearance  directly  to  an  object  in  the  real  world  increases  the 
likelihood  that  it  will  be  perceived  (Carroll,  Mack,  &  Kel¬ 
logg  1988;  W.Gaver  1991;  Anderson  1993).  Psychological 
research  on  feature  integration  theory,  grouping,  continuity, 
and  attention  has  also  contributed  to  this  area  (Banks  &  Kar- 
jicek  1991;  Kinchla  1991).  From  our  perspective,  however, 
existing  approaches  do  not  address  the  possibility  of  the  user 
(or  an  intelligent  automated  assistant)  adjusting  the  viewing 
perspective — distortions  of  objects  or  spatial  relationships 
are  ordinarily  an  effect  that  the  developers  of  visualization 
systems  would  wish  to  avoid. 

AFS 

Our  work  takes  place  in  the  context  of  AFS,  an  abstract  force 
simulator  provided  by  Paul  Cohen’s  lab  at  the  University  of 
Massachusetts  (Atkin  et  al.  1998).  AFS  is  a  general-purpose 
simulation  system  that  supports  experimentation  with  inter¬ 
active,  distributed  planning  techniques  and  their  relationship 
to  physical  processes.  AFS  provides  a  physical  domain  in 
which  abstract  agents  (which  for  clarity  we  will  call  “force 
units”  or  “forces”)  can  interact,  based  generally  on  Newto¬ 
nian  physics.  Forces  and  inanimate  objects  have  mass,  size. 


and  shape;  they  may  be  solid  or  permeable;  they  move  with 
variable  friction  over  a  domain-dependent  surface;  they  ap¬ 
ply  force  to  one  another,  causing  damage/mass  reduction. 
Figure  1  shows  the  existing  2-D  interface  to  AFS,  while  Fig¬ 
ure  2  shows  the  architecture  of  the  system  with  its  various 
components. 

In  AFS’s  Capture  the  Flag  (CTF)  domain,  two  teams  of 
forces  move  over  a  terrain,  their  travel  constrained  by  moun¬ 
tains,  water,  and  forests.  Each  team  is  responsible  for  de¬ 
fending  a  set  of  stationary  flags,  and  successfully  completes 
a  scenario  by  destroying  the  members  of  the  opposing  team 
or  capturing  all  of  its  flags.  Figure  1  shows  a  sample  see- 


nario.  In  this  domain,  as  in  all  AFS  domains,  force  units  rely 
on  a  small  set  of  primitive  physical  actions:  they  may  move 
from  one  location  to  another  and  apply-force  to  other 
forces  and  objects  such  as  flags.  These  actions  can  be  spe¬ 
cialized  and  combined  in  various  ways  to  form  higher  level 
strategies,  such  as  blocking  a  pass,  encircling  a  flag,  attack¬ 
ing  an  opponent  in  a  group,  and  so  forth.  Plan  execution  and 
monitoring  is  provided  by  HAC,  the  hierarchical  planner  at 
the  center  of  the  system. 

We  have  taken  steps  toward  mixed-initiative  planning  in 
AFS,  mixing  a  navigational  metaphor  with  mechanisms  for 
direct  manipulation  (St.  Amant  1997;  St.  Amant,  Long,  & 
Dulberg  1998).  The  user  can  direct  the  low-level  actions  of 
the  teams  of  forces,  and  can  view  visualizations  of  decisions 
the  planner  makes,  such  as  the  tasks  a  team  has  taken  on  and 
how  its  members  are  assigned.  Figure  1  shows  a  visualiza¬ 
tion  of  a  partial  plan,  in  which  some  force  units  are  assigned 
to  defend  their  flags.  The  figure  also  shows  a  plan  browser 
that  displays  a  more  abstract  view  of  the  planning  process. 
As  we  discuss  in  the  next  section,  our  current  work  extends 
this  interface  to  a  3-D  world,  with  the  goal  of  providing  the 
features  described  in  the  Introduction. 

Visual  dialog 

In  VID,  the  Visual  Interaction  Dialog  system,  system  agents 
manipulate  the  location  and  direction  of  cameras  used  for 
evaluation  and  editing  in  a  three-dimensional  (3-D )  planning 
environment.  After  positioning  the  camera,  they  may  add, 
remove,  and  edit  hierarchical  plan  components  overlaid  onto 
domain  specific  representations.  The  effectiveness  of  the  in¬ 
teraction  between  the  user  and  the  HAC  planner  depends  on 
the  ability  of  the  system  to  provide  flexible  visualizations, 
context  registration,  and  task  management  support  (Burstein 
&  McDermott  1996). 

We  emphasize  that  the  current  system  is  under  develop¬ 
ment;  our  discussion  in  this  section  and  the  next  is  of  the 
design  and  early  prototypes  of  components.  AFS  and  HAC 
are  robust  and  support  all  the  planning  activity  and  object 
manipulation  we  discuss.  However,  the  figures  illustrating 
our  example  in  the  next  section  were  generated  program¬ 
matically,  rather  than  entirely  by  hand,  but  are  not  yet  com¬ 
pletely  automated.  Two  major  tasks  for  future  development 
are  the  extension  and  refinement  of  prototypes  for  the  vari¬ 
ous  visual  components  in  VID  and  their  integration  into  AFS. 

Flexible,  interactive  visualizations,  afs  is  designed  as  a 
general  simulator  of  physical  processes.  Since  it  can  sim¬ 
ulate  many  different  domains,  the  visual  representation  of 
plan  components  can  exclude  domain  specific  knowledge. 
To  achieve  this,  VID  will  display  plan  goals,  sub-goals,  prim¬ 
itive  actions,  and  their  relationships  as  semi-transparent  di¬ 
amonds  and  cylinders.  The  diamonds  represent  goals  at 
various  levels  in  the  plan  hierarchy.  Cylinders,  connecting 
hierarchical  goals  and  primitive  actions,  represent  the  re¬ 
lationships  between  them.  These  simple  geometric  shapes 
ease  the  requirements  of  rendering  hardware  allowing  real¬ 
time  manipulation  and  feedback  in  the  virtual  world.  Semi¬ 
transparent  plan  components  avoid  total  occlusion  of  other 
information  from  any  single  viewing  perspective.  Addi¬ 


tional  information  can  be  encoded  into  the  representations 
using  other  visual  dimensions  like  color,  pattern,  etc.  For  ex¬ 
ample,  in  the  CTF  domain  VID  will  color  goals  and  their  re¬ 
lationships  according  to  the  team  and  current  user  selection. 
To  avoid  confusion,  VID  does  not  interchange  the  meaning 
of  domain  specific  assignments  among  the  visual  dimen¬ 
sions.  Encodings  persist  until  another  domain  is  loaded  or, 
the  human  planner  explicitly  changes  them.  In  complex  do¬ 
mains  there  are  many  important  domain  and  plan  elements 
needing  representation.  If  only  a  single  dimension  is  used 
to  represent  each  of  these  elements  we  would  quickly  sur¬ 
pass  the  number  of  commonly  used  visual  dimensions  or, 
we  might  create  a  multi-dimensional  image  to  complex  for 
comprehension.  Conveniently,  VID  can  combine  several  di¬ 
mensions  to  form  a  glyph,  an  abstract  visual  feature,  which 
represents  one  or  more  domain  characteristics. 

Placement  of  these  glyphs  in  VlD’s  3-D  planning  space 
allows  the  user  to  specify  the  spatial  and  temporal  charac¬ 
teristics  of  the  plans.  Figure  3  demonstrates  how  spatial 
and  temporal  information  is  organized  in  VlD’s  3-D  plan¬ 
ning  world.  The  spatial  information  is  obtained  by  extrapo¬ 
lating  from  the  glyph  to  a  point  on  one  of  the  three  surfaces 
used  to  represent  up  to  six  dimensions  in  the  physical  do¬ 
main.  These  domain  surfaces  may  vary  both  in  scale  and 
size.  For  domains  requiring  fewer  than  six  dimensions,  the 
additional  surface  axes  do  not  correspond  to  a  dimension. 
The  point  extrapolated  to  the  surface  locates  the  plan  com¬ 
ponent  using  two  coordinates  for  each  plane,  the  longitude 
and  latitude.  Again,  the  longitude  and  latitude  of  each  plane 
may  or  may  not  map  to  a  dimension  in  the  physical  domain. 
In  some  instances  it  may  be  convenient  to  have  similar  axes 
on  multiple  domain  surfaces  represent  the  same  physical  di¬ 
mension.  For  example,  in  the  CTF  domain  it  is  convenient 
to  use  the  longitude  of  the  secondary  and  tertiary  domain 
surfaces  to  represent  time.  The  distance  from  the  domain 
surface  to  the  plan  component  represents  the  amount  of  time 
before  the  simulator  achieves  the  plan  fragment.  Thus,  the 
more  abstract  components  of  hierarchical  plans  appear  fur¬ 
ther  away  from  each  surface  in  the  3-D  plan  space. 

Although  the  VID  is  flexible  in  representing  multidimen¬ 
sional  planning  spaces,  this  is  of  little  use  to  human  plan¬ 
ners  if  they  cannot  recognize  critical  features  that  dictate  the 
success  or  failure  of  the  plan.  To  assist,  VlD's  direct  ma¬ 
nipulation  interface  permits  system  agents  to  dynamically 
alter  the  location  and  direction  of  cameras  used  for  viewing 
the  planning  space.  VID  typically  constrains  the  interface  by 
providing  one  camera  for  the  system  agents.  However,  mul¬ 
tiple  cameras  could  be  used  to  compare  specific  features  in 
distant  locations  of  the  plan  space.  Each  shared  view  permits 
agents  to  communicate  with  one  another.  When  one  of  the 
system  agents  orients  a  particular  camera  they  are  allowing 
the  other  to  perceive  the  plan  space  as  they  would  (i.e.  as  a 
first  person  perspective).  During  the  collaborative  planning 
process  human  users  and  software  agents  take  turns  serving 
as  an  audience  or,  as  a  director  focusing  on  areas  of  interest 
and  eliminating  visual  clutter.  When  VlD’s  software  agents 
attempt  to  illustrate  key  features,  they  position  the  camera 
according  to  heuristics  taken  from  visual  perception.  For 
convenience,  a  single  example  from  a  CTF  scenario  is  lo- 


cated  at  the  end  of  this  section. 

Context  registration  To  avoid  confusion,  agents  should 
communicate  their  changes  as  they  construct  a  plan.  Con¬ 
text  registration  involves  conveying  areas  of  interest  and  no¬ 
tification  of  new  changes  to  collaborators.  Clearly,  human 
planners  can  only  add,  remove,  and  edit  plan  components 
through  VlD’s  direct  manipulation  interface.  Like  most  in¬ 
terfaces,  camera  placement  and  component  manipulation  are 
the  result  of  keyboard  or  mouse  actions.  Events  that  are  eas¬ 
ily  recognized  by  most  windowing  systems.  The  processing 
of  these  events  then  relays  the  current  state  of  the  world  to 
other  interested  agents  (software  or  remotely  located  human 
planners).  Although  user  modifications  are  always  mediated 
through  the  system,  it  might  be  possible  for  software  agents 
to  perform  tasks  outside  the  viewing  region  of  the  shared 
camera.  To  avoid  any  disparities  of  knowledge,  VlD’s  fun¬ 
damental  principle  requires  software  agents  to  reorient  the 
camera  to  view  any  changes  they  make.  During  some  au¬ 
tomated  tasks  this  additional  camera  movement  may  dis¬ 
tract  planners.  To  correct  this  problem,  users  may  option¬ 
ally  specify  the  amount  and  type  of  information  the  system 
agents  should  communicate. 

Dialog-based  task  management  Although  multiple  agents 
collaborate  through  VID,  software  agents  yield  control  of  the 
dialog  to  human  planners.  Any  time  VID  interrupts  their 
planning  activities  by  attempting  to  reposition  the  camera 
they  easily  reposition  it  by  activating  one  of  the  camera  con¬ 
trols.  This  memory  feature  provides  a  degree  of  control  plas¬ 
ticity  so  VID  can  achieve  a  simple,  quick  style  of  interac¬ 
tion  (Silverman  1992).  Additionally,  VID  does  not  attempt 
to  change  the  location  or  orientation  of  the  shared  camera 


while  human  agents  manipulate  its  controls  or  plan  compo¬ 
nents.  The  flexibility  incorporated  into  VID  allows  the  user 
to  control  the  search  through  the  possible  plan  space  by  fo¬ 
cusing  the  shared  camera  on  smaller  regions.  Within  these 
focused  regions  users  may  specify  plan  objectives  at  a  vari¬ 
ety  of  abstract  levels  and  delegate  the  lower  level  details  to 
other  system  agents. 

An  example 

As  a  simple  demonstration  of  the  previously  discussed  in¬ 
teraction  it  will  help  to  walk  through  an  example  from  the 
CTF  domain.  Figure  4  shows  the  initial  settings  for  a  sce¬ 
nario  in  this  domain.  Forces  are  represented  using  colored 
spheres.  The  color  of  the  spheres  and  flags  denotes  team 
membership.  Forces,  flags,  and  terrain  are  represented  on 
the  primary  domain  surface. 

Each  of  the  planning  decisions  we  discuss  in  our  example 
can  be  made  by  the  planner  or  the  user,  the  interaction  man¬ 
aged  with  conventional  direct  manipulation  mechanisms,  in¬ 
cluding  icon,  menu,  and  button  selections.  For  example, 
the  system  can  accept  instructions  from  the  user  concerning 
which  flags  should  be  targeted;  alternatively,  the  user  might 
ask  for  a  suggestion  by  pressing  a  button.  In  either  case  the 
planner  generates  plans  on  its  own  to  determine  its  goals, 
possibly  deferring  its  execution  of  actions  to  the  user’s  deci¬ 
sions.  For  simplicity,  we  will  assume  in  our  discussion  that 
the  system  makes  all  the  planning  decisions,  each  approved 
by  the  user,  and  the  goal  of  the  interaction  is  to  convey  its 
planning  intentions  to  the  user. 

The  system  begins  by  constructing  a  plan  to  attack  one  of 
the  opponent’s  forces  defending  a  flag.  VID,  along  with  the 
information  provided  by  the  planner,  conveys  the  goal/task 


Figure  4:  Initial  state  in  a  CTF  scenario 


combination  by  placing  the  camera  at  a  position  indicating 
a  high  likelihood  of  mounting  a  successful  attack.  This  is 
shown  in  Figure  5.  In  addition  to  conveying  the  selection  of 
a  specific  target,  the  figure  shows  the  planner’s  selection  of 
a  specific  offensive  force  for  the  task.  Even  limiting  VlD's 
interaction  to  camera  placement,  we  find  some  useful  ben¬ 
efits  in  the  richness  of  visual  cues.  For  example,  the  sys¬ 
tem  shows  the  scope  of  the  action:  off-screen  forces  are  im¬ 
plicitly  considered  irrelevant.  Even  the  speed  at  which  VID 
moves  the  camera  position  into  place  can  influence  the  user’s 
assessment  of  the  required  tempo  of  the  action.  Thus  with 
a  single  camera  positioning,  the  system  can  convey  an  act¬ 
ing  force,  a  target  object,  intervening  forces,  and  a  good  deal 
of  further  implicit  information.  This  conciseness  is  possible 
partly  due  to  the  simplicity  of  the  environment,  but  it  is  also 
because  the  system  relies  on  the  power  of  our  visual  inter¬ 
pretation. 

In  the  completed  system,  VID  will  additionally  add  a  goal 
icon  in  the  plan  space  above  the  final  location.  The  height 
of  the  goal  above  the  primary  domain  surface  will  indicate 
the  amount  of  time  until  the  goal  is  reached.  Initially,  the 
system  estimates  the  amount  of  time  required  for  goal  com¬ 
pletion.  After  establishing  this  goal  the  user  can  adjust  the 
plan  tempo  by  dragging  the  height  of  the  goal  up  or  down. 
The  higher  it  is  the  slower  the  tempo;  a  lower  goal  icon  im¬ 
plies  a  faster  tempo. 

Let’s  complicate  the  situation.  Assume  that  the  planner 


has  formed  a  plan  in  which  the  main  attack  is  provided  by  a 
single  force  unit,  but  that  its  offensive  power  is  insufficient 
to  overcome  its  opponent’s  defense.  To  convey  this,  VID  first 
identifies  the  need  for  further  refinement  of  the  plan  (a  flaw, 
in  an  informal  sense),  and  shows  it  to  the  user  as  in  Figure  6. 
VID  positions  the  camera  so  the  user  can  make  a  compari¬ 
son  between  the  two  forces.  Based  on  perceptual  heuristics, 
VID  zooms  the  camera  in  to  eliminate  as  much  distracting 
information  as  possible.  During  the  zoom,  the  camera  also 
pans  so  the  two  forces  are  shown  along  a  common  baseline, 
to  allow  comparison  of  aligned  distances  (Cleveland  1985). 
The  planner’s  solution  to  this  mismatch  is  to  select  another 
force  unit  for  assistance.  The  planner  conveys  this  decision 
by  repositioning  the  camera  as  it  did  for  the  first  force  unit, 
showing  the  new  unit’s  contribution  to  the  main  action. 


Part  of  the  power  of  this  approach  is  that  we  are  able  to 
exploit  the  physical  nature  of  planning  with  AFS.  It  might 
seem  that  for  some  purposes  the  interaction  is  at  too  low  a 
level,  however — how  can  one  manage  abstractions,  such  as 
some  number  of  forces  occupying  some  area?  Fortunately, 
AFS  supports  aggregation  of  objects  and  spatial  regions.  The 
planner  or  the  user  can  group  and  characterize  agents,  for  ex¬ 
ample,  to  allow  visualizations  at  different  levels  of  abstrac¬ 
tion. 


Figure  6:  VID  conveys  a  problem  with  the  basic  plan 


Conclusion 

In  the  previous  example  we  saw  how  VlD’s  shared  manipu¬ 
lation  of  camera  is  used  for  communication  among  various 
system  agents.  VID  makes  this  style  of  interaction  possible 
by  providing  flexible,  interactive  visualization  based  on  vi¬ 
sual  perception,  context  registration,  and  dialog  based  task 
management. 

The  visual  approach  has  limitations,  and  we  do  not  pro¬ 
pose  it  as  an  exclusive  modality  for  interaction.  For  exam¬ 
ple,  VID  does  not  provide  a  direct  means  for  users  to  gener¬ 
ate  system  queries,  and  it  is  not  entirely  clear  how  visualiza¬ 
tions  can  address  temporal  reasoning.  However,  we  believe 
the  approach  has  significant  promise.  We  hope  that  the  com¬ 
pleted  system,  like  the  Magic  Lens  filters  (Stone,  Fishkin,  & 
Bier  1995;  Fishkin  &  Stone  1995),  will  offer  users  a  quick, 
easy-to-use  interface  with  which  they  can  find  answers  to 
many  of  their  questions. 

Acknowledgments 

The  graphical  interface  could  not  have  been  implemented 
without  the  assistance  and  donated  code  of  John  Wiseman. 
Our  work  with  AFS  has  been  helped  greatly  by  interaction 
with  Paul  Cohen,  David  Westbrook,  and  Marc  Atkin.  This 
research  was  supported  by  ARPA/Rome  Laboratory  under 
contract  F30602-97-1-0289.  The  U.S.  Government  is  autho¬ 
rized  to  reproduce  and  distribute  reprints  for  governmental 
purposes  not  withstanding  any  copright  notation  hereon. 

References 

Allen,  J.  F.  1994.  Mixed  initiative  planning:  Po¬ 
sition  paper.  [WWW  document].  Presented  at  the 
ARPA/Rome  Labs  Planning  Initiative  Workshop.  URL 
http://www.cs.rochester.edu/research/trains/mip/. 
Anderson,  B.  1993.  Graphical  interfaces  considered 
as  representations  of  the  real  world:  Implications  of  an 
affordances-based  model.  In  Valenti,  S.  S.,  and  Pittenger, 
J.  B.,  eds..  Studies  in  Perception  and  Action  II,  89-93. 
Lawrence  Erlbaum. 

Atkin,  M.;  Westbrook,  D.  L.;  Cohen,  P.  R.;  and  Jorstad, 
G.  D.  1998.  Afs  and  hac:  Domain-general  agent  simulation 
and  control.  In  AAAI-98  Workshop  on  Software  Tools  for 
Developing  Agents,  89-95. 

Banks,  W.  R,  and  Karjicek,  D.  1991.  Perception.  In  Annual 
Review  of  Psychology,  305-331.  Annual  Reviews. 

Bares,  W.  H.,  and  Lester,  J.  C.  1997.  Realtime  generation 
of  customized  3d  animated  explanations  for  knowledge- 
based  learning  environments.  In  Proceedings  of  the  Four¬ 
teenth  National  Conference  on  Artificial  Intelligence,  347- 
354. 

Bares,  W.  H.,  and  Lester,  J.  C.  1999.  Intelligent  multi¬ 
shot  visualization  interfaces  for  dynamic  3d  worlds.  In 
Proceedings  of  the  1999  International  Conference  on  In¬ 
telligent  User  Interfaces,  119-126. 

Bertoline,  G.  R.;  Wiebe,  E.  N.;  Miller,  C.;  and  Mohler, 
J.  L.  1997.  Technical  graphics  communications  (2nd  Ed.). 
McGraw-Hill. 


Brown,  J.  R.;  Earnshaw,  R.;  Jern,  M.;  and  Vince,  J.  1995. 
Visualization:  Using  computer  graphics  to  explore  data 
and  present  information.  Wiley. 

Burstein,  M.  H.,  and  McDermott,  D.  V.  1996.  Issues  in 
the  development  of  human-computer  mixed-initiative  plan¬ 
ning.  In  Cognitive  Technology:  In  Search  of  a  Humane 
Interface,  285-303.  Elsevier. 

Carroll,  J.  M.;  Mack,  R.  L.;  and  Kellogg,  W.  A.  1988. 
Interface  metaphors  and  user  interface  design.  North  Hol¬ 
land'.  45-65. 

Christianson,  D.  B.;  Anderson,  S.  E.;  He,  L.  W.;  Salesin, 
D.  H.;  Weld,  D.;  and  Cohen,  M.  F.  1996.  Declarative  cam¬ 
era  control  for  automatic  cinematography.  In  Proceedings 
of  the  Thirteenth  National  Conference  on  Artificial  Intelli¬ 
gence,  148-155. 

Cleveland,  W.  S.  1985.  The  elements  of  graphing  data. 
Wadsworth. 

Drucker,  S.,  and  Zelter,  D.  1994.  Intelligent  camera  con¬ 
trol  in  a  virtual  environment.  In  Proceedings  of  Graphics 
Interface,  190-199. 

Drucker,  S.,  and  Zelter,  D.  1995.  Camdroid:  A  system  for 
implementing  intelligent  camera  control.  In  Proceedings  of 
the  1 995  Symposium  on  Interactive  3D  Graphics,  139-144. 

Feiner,  S.  K.,  and  McKeown,  K.  1991.  Automating  the 
generation  of  coordinated  multimedia  explanations.  IEEE 
Computer  24(101:33-41. 

Ferguson,  G.,  and  Allen,  J.  1996.  Trains-95:  Towards  a 
mixed-initiative  planning  assistant.  In  Proceedings  of  the 
Third  International  Conference  on  Artificial  Intelligence 
Planning  Systems,  70-77 . 

Ferguson,  G.,  and  Allen,  J.  1998.  Trips:  An  intelligent 
integrated  problem-solving  assistant.  In  Proceedings  of  the 
Fifth  National  Conference  on  Artificial  Intelligence. 

Fishkin,  K.,  and  Stone,  M.  C.  1995.  Enhanced  dynamic 
queries  via  movable  filters.  In  Proceedings  of  the  CHI  ’95 
Conference,  415-420.  ACM  Press. 

Flach,  J.;  Hancock,  P;  Caird,  J.;  and  Vicente,  K.  1995. 
Global  perspectives  on  the  ecology  of  human-machine  sys¬ 
tems.  Lawrence  Erlbaum. 

Friedhoff,  R.  M.,  and  Benzon,  W.  1989.  Visualization:  The 
second  computer  revolution.  Abrams. 

Gleicher,  M.,  and  Witkin,  A.  1992.  Through-the-lens  cam¬ 
era  control.  Computer  Graphics  26(2):33 1-340. 

He,  L.  W.;  Cohen,  M.  F.;  and  Salesin,  D.  H.  1996.  The  vir¬ 
tual  cinematographer:  a  paradigm  for  automatic  real-time 
camera  control  and  directing.  In  Proceedings  of  the  ACM 
SIGGRAPH  '96,  217-224.  ACM  Press. 

J. Gibson,  J.  1979.  The  Ecological  Approach  to  Visual  Per¬ 
ception.  Houghton  Mifflin. 

Karp,  R,  and  Feiner,  S.  1990.  Issues  in  the  automated 
generation  of  animated  presentations.  In  Proceedings  of 
Graphics  Interface  ’90,  39-48. 

Keller,  P.  R.,  and  Keller,  M.  M.  1993.  Visual  Cues.  IEEE 
Press. 


Kinchla,  R.  A.  1991.  Attention.  In  Annual  Review  of 
Psychology ,  711-742.  Annual  Reviews. 

Kosslyn,  S.  M.  1994.  Elements  of  graph  design.  Freeman. 
1995.  Participating  in  Explanatory  Dialogs.  MIT  Press, 
chapter  A  direct  manipulation  interface  for  user  feedback. 
Phillips,  C.  B.;  Badler,  N.;  and  Granieri,  J.  1992.  Auto¬ 
matic  viewing  control  for  3d  direct  manipulation.  In  Pro¬ 
ceedings  of  the  1992  Symposium  on  Interactive  3D  Graph¬ 
ics,!  1-7  4. 

Rich,  C.,  and  Sidner,  C.  L.  1998.  COLLAGEN:  a  collabo¬ 
ration  manager  for  software  interface  agents.  User  Model¬ 
ing  and  User-Adapted  Interaction  8(3/4). 

Roth,  S.  F.;  Chuah,  M.  C.;  Kerpedjiev,  S.;  and  Kolo- 
jejchick,  J.  A.  Toward  an  information  visualization 
workspace:  combining  multiple  means  of  expression. 
Human-Computer  Interaction. 

Seligmann,  D.,  and  Feiner,  S.  1991.  Automated  genera¬ 
tion  of  intent-based  3d  illustrations.  Computer  Graphics 
25(4):  123—132. 

Silverman,  B.  G.  1992.  Human-computer  collaboration. 
Human-Computer  Interaction  7:165-196. 

Smith,  S.,  and  Bates,  J.  1989.  Towards  a  theory  of  narrative 
for  interactive  fiction.  School  of  Computer  Science  CMU- 
CS-89-121,  Carnegie  Mellon  University. 

St.  Amant,  R.;  Long,  T.;  and  Dulberg,  M.  S.  1998.  Eval¬ 
uation  in  a  navigation  testbed.  Knowledge-Based  Systems 
1 1(1):61— 70. 

St.  Amant,  R.  1997.  Navigation  and  planning  in  a  mixed- 
initiative  user  interface.  In  Proceedings  of  the  Fifteenth  Na¬ 
tional  Conference  on  Artificial  Intelligence.  AAAI  Press. 
64-69. 

St.  Amant,  R.  1999.  User  interface  affordances  in  a  plan¬ 
ning  representation.  Human  Computer  Interaction.  In 
press. 

Stone,  M.  C.;  Fishkin,  K.;  and  Bier,  E.  A.  1995.  The 
movable  filter  as  a  user  interface  tool.  In  Proceedings  of 
the  CHI  ’94  Conference,  306-312.  ACM  Press. 

Tufte,  E.  R.  1983.  The  Visual  Display  of  Quantitative  In¬ 
formation.  Graphics  Press. 

W.Gaver,  W.  1991.  Technology  affordances.  In  Proceed¬ 
ings  of  the  CHI’91  Conference,  79-84.  ACM  Press. 

Woods,  D.  D.  1991.  Human-Computer  Interaction  and 
Complex  Systems.  Academic  Press,  chapter  The  cognitive 
engineering  of  problem  representations,  169-188. 


